Fast Access to Columnar, Hierarchical Data via Code Transformation
نویسندگان
چکیده
Big Data query systems represent data in a columnar format for fast, selective access, and in some cases (e.g. Apache Drill), perform calculations directly on the columnar data without row materialization, avoiding runtime costs. However, many analysis procedures cannot be easily or efficiently expressed as SQL. In High Energy Physics, the majority of data processing requires nested loops with complex dependencies. When faced with tasks like these, the conventional approach is to convert the columnar data back into an object form, usually with a performance price. This paper describes a new technique to transform procedural code so that it operates on columnar data natively, without row materialization. It can be viewed as a compiler pass on the typed abstract syntax tree, rewriting references to objects as columnar array lookups. We will also present performance comparisons between transformed code and conventional object-oriented code in a High Energy Physics context.
منابع مشابه
FlashQueryFile: Flash-Optimized Layout and Algorithms for Interactive Ad Hoc SQL on Big Data
High performance storage layer is vital for allowing interactive ad hoc SQL analytics (OLAP style) over Big Data. The paper makes a case for leveraging flash in the Big Data stack to speed up queries. State-ofthe-art Big Data layouts and algorithms are optimized for hard disks (i.e., sequential access is emphasized over random access) and result in suboptimal performance on flash given its dras...
متن کاملImproving Data Grids Performance by Using Modified Dynamic Hierarchical Replication Strategy
Abstract: A Data Grid connects a collection of geographically distributed computational and storage resources that enables users to share data and other resources. Data replication, a technique much discussed by Data Grid researchers in recent years creates multiple copies of file and places them in various locations to shorten file access times. In this paper, a dynamic data replication strate...
متن کاملEnforcing RBAC Policies over Data Stored on Untrusted Server (Extended Version)
One of the security issues in data outsourcing is the enforcement of the data owner’s access control policies. This includes some challenges. The first challenge is preserving confidentiality of data and policies. One of the existing solutions is encrypting data before outsourcing which brings new challenges; namely, the number of keys required to access authorized resources, efficient policy u...
متن کاملFactors affecting delivery of DREB1A gene in maize B73 split-seeds via biolistic system
Immature embryos as a choice tissue for genetic transformation of maize have a few limitations, such as genotype dependence, time-consuming and requiring a well-equipped greenhouse for access, at any time. In the present study, the split-seed explants were used for genetic transformation of maize, B73 line. The transformation of maize split-seed explants from the inbred line B73, for resistance...
متن کاملCommunication Complexity of the Fast Multipole Method and its Algebraic Variants
A combination of hierarchical tree-like data structures and data access patterns from fast multipole methods and hierarchical low-rank approximation of linear operators from H-matrix methods appears to form an algorithmic path forward for efficient implementation of many linear algebraic operations of scientific computing at the exascale. The combination provides asymptotically optimal computat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1708.08319 شماره
صفحات -
تاریخ انتشار 2017